NTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features
نویسندگان
چکیده
This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information and language links) but also developed a method that analyzes the semantic features of anchor texts in Chinese Wikipedia. In the linking phase, a Latent Dirichlet Allocation model (LDA) is used for the computation of a text similarity measure among the English Wikipedia articles. This novel approach to address the word-to-links ambiguity issue shows encouraging result in the CrossLink-2 evaluation.
منابع مشابه
NTCIR-10 CrossLink-2 Task: A Link Mining Strategy
At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the crosslingual linking method achieved promising results.
متن کاملOverview of the NTCIR-10 Cross-Lingual Link Discovery Task
This paper presents an overview of NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) task. For the task, we continued using the evaluation framework developed for the NTCIR-9 CrossLink-1 task. Overall, recommended links were evaluated at two levels (file-to-file and anchor-to-file); and system performance was evaluated with metrics: LMAP, R-Prec and P@N.
متن کاملKMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia Using Explicit Semantic Analysis
This paper describes the methods used in the submission of Knowledge Media institute (KMI), The Open University to the NTCIR-9 Cross-Lingual Link Discovery (CLLD) task entitled CrossLink. KMI submitted four runs for link discovery from English to Chinese; however, the developed methods, which utilise Explicit Semantic Analysis (ESA), are applicable also to other language combinations. Three of ...
متن کاملWUST EN-CS Crosslink System at NTCIR-9 CLLD Task
This paper describes our work in NTCIR-9 on the task of Cross-Lingual Link Discovery (Crosslink/CLLD). The work mainly focuses on two aspects to accomplish this task: (1) How to collect useful data for Crosslink and (2) How to use the data correctly and effectively. The system firstly uses online data collecting and text mining in Chinese Wikipedia articles to build the basic Crosslink database...
متن کاملA Single-step Machine Learning Approach to Link Detection in Wikipedia: NTCIR Crosslink-2 Experiments at KSLP
This study describes a link detection method to find relevant cross-lingual links from Korean Wikipedia documents to English ones at term level. Earlier wikification approaches have used two independent steps for link disambiguation and link determination. This study seeks to merge these two separate steps into a singlestep machine learning scheme. Our method at NTCIR-10 Koreanto-English CLLD t...
متن کامل